Hashing Graphs Trees Search Trees Indexing and Multiway Trees File Organization

Introduction

The Table abstract data type

Implementations of the table data structure

Hash Tables

Bucket Array

Hash Function

Hash Code

Compression Functions

Collision-Handling Schemes

Collision likelihoods and load factors for hash tables

Hash Table Implementation

A simple Hash Table in operation

Strategies for dealing with collisions

Linear Probing

Double Hashing

Complexity of hash tables

Hash functions play a crucial role in the implementation of hash tables, providing a way to map keys to indices in a bucket array. However, the raw hash code generated for a key may not be directly usable as an index due to its potentially large range. The compression function addresses this issue, ensuring that the hash code is mapped into the desired range [0, N-1], where N is the size of the bucket array.


The Division Method:

One straightforward compression function is the division method, given by the formula:

h(k) = |k| mod N

Here, (h(k)) represents the compressed hash value for the key (k), and (N) is the size of the bucket array. This method is simple and quick, but its effectiveness is enhanced when (N) is a prime number. Using a prime number helps spread out the distribution of hash codes, reducing the likelihood of collisions.

Let's illustrate this with an example. Suppose we have keys {200, 205, 210, 215, 220, ..., 600} that we want to hash into a bucket array of size 100. Using the division method with (N = 100), each hash code collides with three others. However, if we use (N = 101) (a prime number), there are no collisions. This emphasizes the importance of selecting a prime (N) to minimize collisions and achieve a more even distribution of keys.


The MAD Method:


The multiply add and divide (MAD) method is a more sophisticated compression function designed to eliminate repeated patterns in sets of integer keys. The compression function is defined as:


h(k) = |ak + b| mod N

Here, (a) and (b) are nonnegative integers randomly chosen, and (N) is a prime number, ensuring (a mod N neq 0). The MAD method aims to create a hash function with good behavior, spreading keys evenly and minimizing collisions.

For a concrete example, consider a scenario where we want to hash integer keys. By selecting appropriate values for (a), (b), and (N), we can achieve an effective hash function. This prevents repeated patterns in the distribution of hash codes, improving the overall performance of the hash table.


Collision Handling:


Once a hash function is established, the issue of collisions must be addressed. Collisions occur when two distinct keys hash to the same index. To handle collisions, various techniques are employed, such as chaining or open addressing.

Chaining: In this method, each bucket in the array maintains a linked list of elements that hash to the same index. If a collision occurs, the new element is added to the linked list at that index. This ensures that multiple elements can coexist at the same index without conflict.

Open Addressing: In contrast, open addressing attempts to find an alternative location for the colliding element within the same array. This may involve probing techniques like linear probing or double hashing to find the next available slot.


In conclusion, the combination of a well-designed hash function and an effective collision resolution strategy forms the foundation of a robust hash table implementation. The division method and MAD method are examples of compression functions that contribute to creating hash functions with desirable properties, while collision handling techniques ensure the integrity and efficiency of the overall hash table structure. Understanding these concepts is fundamental for implementing efficient and reliable hash tables in various computer science applications.

Compression Function:


A compression function is a crucial component of a hash function. Its role is to transform a potentially large range of hash codes into a smaller, manageable range suitable for indexing into a bucket array. This ensures that the hash codes generated for keys can be mapped to valid indices within the bucket array, preventing errors such as negative indices or indices exceeding the array's capacity. The compression function helps maintain the efficiency and integrity of hash table operations by mapping keys to appropriate locations in the bucket array.